---
title: October 2023
description: Read about DataRobot's new public preview and generally available features released in October, 2023.

---

# October 2023 {: #october-2023 }

_October 25, 2023_

With the latest deployment, DataRobot's AI Platform delivered the new GA and Public Preview features listed below. From the release center, you can also access:

* [Monthly deployment announcement history](cloud-history/index)
* [Public preview features](public-preview/index)
* [Self-Managed AI Platform release notes](archive-release-notes/index)

### October release {: #october-release }

The following table lists each new feature:

??? abstract "Features grouped by capability"

    Name       |  GA | Public Preview
    ---------- | ---- | ---
    **Data** |  :~~:  | :~~:
    [Speed improvements to Relationship Quality Assessment](#speed-improvements-to-the-relationship-quality-assessment) | ✔ |  
    [Snowflake key pair authentication](#snowflake-key-pair-authentication) | ✔ |  
    [AWS S3 connection enhancements](#aws-s3-connection-enhancements) | | ✔
    [Broader support for Azure Databricks added to Workbench](#broader-support-for-azure-databricks-added-to-workbench) | | ✔
    **Modeling** |  :~~:  | :~~:
    [Document AI brings PDF documents as a data source](#document-ai-brings-pdf-documents-as-a-data source) |✔ |   
    [Prediction Explanations for cluster models now GA](#prediction-explanations-for-cluster-models-now-ga) |✔ |   
    [GPU improvements enhance training for deep learning models](#gpu-improvements-enhance-training-for-deep-learning-models) | | ✔  
    [SHAP Prediction Explanations now in Workbench](#shap-prediction-explanations-now-in-workbench) | | ✔
    **Applications** |  :~~:  | :~~:
    [New app experience in Workbench](#new-app-experience-in-workbench) | ✔ |  
    **Predictions and MLOps** |  :~~:  | :~~:
    [Model package artifact creation workflow](#model-package-artifact-creation-workflow) | ✔ |
    [Versioning support in the new Model Registry](#versioning-support-in-the-new-model-registry) | ✔ |
    [Extend compliance documentation with key values](#extend-compliance-documentation-with-key-values) | ✔ |
    [Public network access for custom models](#public-network-access-for-custom-models) | ✔ |
    [Predictions on training data in Workbench](#predictions-on-training-data-in-workbench) | ✔ |
    [Custom model deployment status information](#custom-model-deployment-status-information) | ✔ |
    [Auto-sampling for client side aggregation](#auto-sampling-for-client-side-aggregation) | ✔ |
    [New operators for Apache Airflow](#new-operators-for-apache-airflow) | ✔ |
    [Databricks JDBC write-back support for batch predictions](#databricks-jdbc-write-back-support-for-batch-predictions) | ✔ |  
    [Batch monitoring for deployment predictions](#batch-monitoring-for-deployment-prediction) |  | ✔
    [Accuracy for monitoring jobs with aggregation enabled](#accuracy-for-monitoring-jobs-with-aggregation-enabled) |  | ✔
    **Notebooks** |  :~~:  | :~~:
    [Schedule notebook jobs](#schedule-notebook-jobs) |  | ✔
    [Custom environment images for DataRobot Notebooks](#custom-environment-images-for-datarobot-notebooks) |  | ✔


### GA {: #ga }

#### Document AI brings PDF documents as a data source {: #document-ai-brings-pdf-documents-as-a-data source }

Available in DataRobot Classic, [Document AI](doc-ai/index) is now GA, providing a way to build models on raw PDF documents without additional, manually intensive data preparation steps. Addressing the issues of information spread out in a large corpus and other barriers to efficient use of documents as a data source, Document AI eases data prep and provides insights for PDF-based models.

![](images/rn-docai-13a.png)


#### Prediction Explanations for cluster models now GA {: #prediction-explanations-for-cluster-models-now-ga }

Prediction Explanations with clustering uncover which factors most contributed to any given row’s cluster assignment. Now generally available, this insight helps you to easily explain clustering model outcomes to stakeholders and identify high-impact factors to help focus business strategies.

Functioning very much like multiclass Prediction Explanations—but reporting on clusters instead of classes—cluster explanations are available from both the Leaderboard and deployments. They are available for all XEMP-based clustering projects and are not available with time series.

![](images/pe-clustering-3.png)


#### Model package artifact creation workflow {: #model-package-artifact-creation-workflow }

Now generally available, the improved model package artifact creation workflow provides a clearer and more consistent path to model deployment with visible connections between a model and its associated model packages in the Model Registry. Using this new approach, when you deploy a model, you begin by providing model details and registering the model. Then, after you create the model package and allow the build to complete, you can deploy the model by [adding the deployment information](add-deploy-info).

1. On the **Leaderboard**, select the model to use for generating predictions. DataRobot recommends a model with the **Recommended for Deployment** and **Prepared for Deployment** badges. Click **Predict > Deploy**. If the Leaderboard model you select doesn't have the **Prepare for Deployment** badge, DataRobot recommends you click **Prepare for Deployment** to run the [model preparation](model-rec-process#prepare-a-model-for-deployment) process for that model.

    ![](images/rn-predict-deploy.png)

2. On the **Deploy model** tab, provide the required model package information, and then click **Register to deploy**.

    ![](images/rn-register-to-deploy.png)

3. Allow the model to build. The **Building** status can take a few minutes, depending on the size of the model. A model package must have a **Status** of **Ready** before you can deploy it.

    ![](images/rn-model-artifact-creation-build.png)

4. In the **Model Packages** list, locate the model package you want to deploy and click **Deploy**.

    ![](images/rn-model-package-deploy.png)

5. Add [deployment information and create the deployment](add-deploy-info).

For more information, see the [documentation](deploy-model).

#### Versioning support in the new Model Registry {: #versioning-support-in-the-model-registry }

Now generally available for [app.eu.datarobot.com](https://app.eu.datarobot.com/) users, the new Model Registry is an organizational hub for the variety of models used in DataRobot. Models are registered as deployment-ready model packages. These model packages are grouped into _registered models_ containing _registered model versions_, allowing you to categorize them based on the business problem they solve. Registered models can contain DataRobot, custom, external, challenger, and automatically retrained models as versions.

During this update, packages from the **Model Registry > Model Packages** tab are converted to registered models and migrated to the new **Registered Models** tab. Each migrated registered model contains a registered model version, and the original packages can be identified in the new tab by the model package ID appended to the registered model name.

Once the migration is complete, in the updated **Model Registry**, you can track the evolution of your predictive and generative models with new versioning functionality and centralized management. In addition, you can access both the original model and any associated deployments and share your registered models (and the versions they contain) with other users.

![](images/rn-reg-models-page.png)

This update builds on the [previous model package workflow changes](#model-package-artifact-creation-workflow), requiring the registration of any model you intend to deploy. To register and deploy a model from the Leaderboard, you must first provide model registration details:

1. On the **Leaderboard**, select the model to use for generating predictions. DataRobot recommends a model with the **Recommended for Deployment** and **Prepared for Deployment** badges. The [model preparation](model-rec-process) process runs feature impact, retrains the model on a reduced feature list, and trains on a higher sample size, followed by the entire sample (latest data for date/time partitioned projects).

    ![](images/rn-prepared-for-deployment.png)

2. Click **Predict > Deploy**. If the Leaderboard model doesn't have the **Prepare for Deployment** badge, DataRobot recommends you click **Prepare for Deployment** to run the [model preparation](model-rec-process#prepare-a-model-for-deployment) process for that model.

    ![](images/rn-prepare-for-deployment-process.png)

    !!! tip
        If you've already added the model to the Model Registry, the registered model version appears in the **Model Versions** list. You can click **Deploy** next to the model and skip the rest of this process.

3. Under **Deploy model**, click **Register to deploy**.

    ![](images/rn-reg-dr-model.png)

4. In the **Register new model** dialog box, provide the required model package model information:

    ![](images/rn-reg-model-fields.png)

5. Click **Add to registry**. The model opens on the **Model Registry > Registered Models** tab.

6. While the registered model builds, click **Deploy** and then [configure the deployment settings](add-deploy-info).

    ![](images/rn-model-artifact-creation-building.png)

For more information, see the [documentation](model-registry).

#### Extend compliance documentation with key values {: #extend-compliance-documentation-with-key-values }

Now generally available, you can create key values to reference in compliance documentation templates. Adding a key value reference includes the associated data in the generated template, limiting the manual editing needed to complete the compliance documentation. Key values associated with a model in the Model Registry are key-value pairs containing information about the registered model package:

![](images/key-values-tab.png)

When you [build custom compliance documentation templates](template-builder), you can include string, numeric, boolean, image, and dataset key values:

![](images/add-kv-to-template.png)

Then, when you [generate compliance documentation for a model package](reg-compliance) with a custom template referencing a supported key value, DataRobot inserts the matching values from the associated model package; for example, if the key value has an image attached, that image is inserted.

For more information, see the [documentation](reg-key-values).

#### Public network access for custom models {: #public-network-access-for-custom-models }

Now generally available as a premium feature, you can enable full network access for any custom model. When you create a custom model, you can access any fully qualified domain name (FQDN) in a public network so that the model can leverage third-party services. Alternatively, you can disable public network access if you want to isolate a model from the network and block outgoing traffic to enhance the security of the model. To review this access setting for your custom models, on the **Assemble** tab, under **Resource Settings**, check the **Network access**:

![](images/network-access-setting.png)

For more information, see the [documentation](custom-model-resource-mgmt).

#### Predictions on training data in Workbench {: #predictions-on-training-data-in-workbench }

Now generally available in Workbench, after you create an experiment and train models, you can make predictions on training data from **Model actions** > :material-chart-scatter-plot: **Make predictions**:

![](images/wb-model-action-pred.png)

When you make predictions on training data, you can select one of the following options, depending on the project type:

![](images/wb-pred-source-training-data.png)

Project type    | Options
----------------|--------
AutoML          | Select one of the following training data options: <ul><li>**Validation**</li><li>**Holdout**</li><li>**All data**</li></ul>
OTV/Time Series | Select one of the following training data options: <ul><li>**All backtests**</li><li>**Holdout**</li></ul>

!!! warning "In-sample prediction risk"
    Depending on the option you select and the sample size the model was trained on, predicting on training data can generate in-sample predictions, meaning that the model has seen the target value during training and its predictions do not necessarily generalize well. If DataRobot determines that one or more training rows are used for predictions, the **Overfitting risk** warning appears. These predictions should not be used to evaluate the model's accuracy.

For more information, see the [documentation](wb-predict).

#### Custom model deployment status information {: #custom-model-deployment-status-information }

Now generally available, when you deploy a custom model in DataRobot, deployment status information is surfaced through new badges in the **Deployments** inventory, warnings in the deployment, and events in the **MLOps Logs**.

After you [add deployment information and deploy a custom model](add-deploy-info), the **Creating deployment** modal appears, tracking the status of the deployment creation process, including the application of deployment settings and the calculation of the drift baseline. You can monitor the deployment progress from the modal, allowing you to access the [**Check deployment's MLOps logs**](service-health#view-mlops-logs) link if an error occurs:

![](images/deploy-custom-model-progress-modals.png)

In the **Deployments** inventory, you can see the following deployment status values in the **Deployment Name** column:

![](images/custom-model-deploy-status.png)

Status         | Badge     
---------------|---------------
![](images/mgmt-agent-launch.png){: style="height:20px; width:auto"} | The custom model deployment process is still in progress. You can't currently make predictions through this deployment or access deployment tabs that require an active deployment.
![](images/cus-model-warn.png){: style="height:20px; width:auto"}   | The custom model deployment process completed with errors. You may be unable to make predictions through this deployment; however, if you deactivate this deployment, you can't reactivate it until you resolve the deployment errors. You should check the [MLOps Logs](service-health#view-mlops-logs) to troubleshoot the custom model deployment.
![](images/mgmt-agent-error.png){: style="height:20px; width:auto"}  | The custom model deployment process failed, and the deployment is **Inactive**. You can't currently make predictions through this deployment or access deployment tabs that require an active deployment. You should check the [MLOps Logs](service-health#view-mlops-logs) to troubleshoot the custom model deployment.

From a deployment with an **Errored** or **Warning** status, you can access the **Service Health MLOps logs** link from the warning on any tab. This link takes you directly to the **Service Health** tab:

![](images/custom-model-deploy-warning-link.png)

On the **Service Health** tab, under **Recent Activity**, you can click the **MLOps Logs** tab to view the **Event Details**. In the **Event Details**, you can click :fontawesome-solid-code:{.lg} **View logs** to access the [custom model deployment logs](deploy-custom-inf-model#deployment-logs) to diagnose the cause of the error:

![](images/custom-model-deploy-logs.png)

#### Auto-sampling for client side aggregation {: #auto-sampling-for-client-side-aggregation }

Now generally available, [large-scale monitoring](agent-use#enable-large-scale-monitoring) with the [monitoring agent](monitoring-agent/index) supports the automatic sampling of raw features, predictions, and actuals to support challengers and accuracy tracking. To enable this feature, when configuring large-scale monitoring, define the `MLOPS_STATS_AGGREGATION_AUTO_SAMPLING_PERCENTAGE` environment variable to determine the percentage of raw data to report to DataRobot using algorithmic sampling. In addition, you must define `MLOPS_ASSOCIATION_ID_COLUMN_NAME` to identify the column in the input data containing the data for sampling.

For more information, see the [documentation](agent-use#enable-large-scale-monitoring).

#### New operators for Apache Airflow {: #new-operators-for-apache-airflow }

You can combine the capabilities of DataRobot MLOps and Apache Airflow to [implement a reliable solution](apache-airflow) for retraining and redeploying your models; for example, you can retrain and redeploy your models on a schedule, on model performance degradation, or using a sensor that triggers the pipeline in the presence of new data.

The DataRobot provider for Apache Airflow now includes new operators:

* `StartAutopilotOperator` Triggers DataRobot Autopilot to train a set of models.
* `CreateExecutionEnvironmentOperator` Creates an execution environment.
* `CreateCustomInferenceModelOperator` Creates a custom inference model.
* `GetDeploymentModelOperator` Retrieves information about the deployment's current model.

For more information about the new operators, reference the [documentation](https://github.com/datarobot/airflow-provider-datarobot#modules){ target=_blank }.

#### Databricks JDBC write-back support for batch predictions {: #databricks-jdbc-write-back-support-for-batch-predictions }

With this release, Databricks is supported as a JDBC [data source](data-conn) for batch predictions. For more information on supported data sources for batch predictions, see the [documentation](batch-prediction-api/index#data-sources-supported-for-batch-predictions).


#### Speed improvements to Relationship Quality Assessment {: #speed-improvements-to-the-relationship-quality-assessment }

Now generally available for SaaS users, to improve [Relationship Quality Assessment](fd-overview#test-relationship-quality) run times, DataRobot subsamples approximately 10% of the primary dataset, speeding up the computation without impacting the enrichment rate estimation accuracy or the results of the assessment. After the assessment is done, the sampling percentage is included at the top of the report.

![](images/rqa-speed-1.png)

#### Snowflake key pair authentication {: #snowflake-key-pair-authentication }

Now generally available, create a [Snowflake data connection](dc-snowflake#key-pair) in DataRobot Classic and Workbench using the key pair authentication method—a Snowflake username and private key—as an alternative to basic and OAuth authentication. This also allows you to [share secure configurations](secure-config) for key pair authentication.


#### New app experience in Workbench {: #new-app-experience-in-workbench }

Now generally available, DataRobot introduces a new, [streamlined application experience](wb-app-edit) in Workbench that provides you with the unique ability to easily view, explore, and create valuable snapshots of information. This release introduces the following improvements:

- Applications have a new, simplified interface and [creation workflow](wb-app-create) to make the experience more intuitive.
- Application creation automatically generates insights, like Feature Impact and ROC Curve, based on the model powering your application.
- Applications created from an experiment in Workbench no longer open outside of Workbench in the application builder.

![](images/wb-app-present.png)

### Public Preview {: #public-preview }

#### GPU improvements enhance training for deep learning models {: #gpu-improvements-enhance-training-for-deep-learning-models }

This deployment brings several enhancements to the public preview GPU feature, including:

* Additional blueprints are now available for GPU training&mdash;MiniLM, Roberta, and TinyBERT featurizers are now available.

* Depending on the project:
	* Keras Text Convolutional Neural Network blueprints may train during Quick Autopilot.
	* Image Finetuner blueprints may train during full Autopilot.

* GPU and CPU variants are now available in the repository, allowing a choice of which worker type to train on.

* GPU variant blueprints are optimized to train faster on GPU workers.

Public preview [documentation](gpus)

**Feature flag OFF by default**: Enable GPU Workers

#### SHAP Prediction Explanations now in Workbench {: #shap-prediction-explanations-now-in-workbench }

SHAP Prediction Explanations estimate how much each feature contributes to a given prediction, reported as its difference from the average. They are intuitive, unbounded (computed for all features), fast, and, due to the open source nature of SHAP, transparent. With this deployment, SHAP explanations are supported in Workbench for all non-time series experiments. Accessed from the Model overview tab, SHAP explanations provide a preview for a general "intuition" of model performance with an option to view explanations for the entire dataset.

![](images/wb-exp-eval-34.png)

Public preview [documentation](ml-experiment-evaluate#shap-prediction-explanations)

**Feature flag ON by default**: SHAP in Workbench

#### Broader support for Azure Databricks added to Workbench {: #broader-support-for-azure-databricks-added-to-workbench }

Now available for public preview, the following support for Azure Databricks has been added to Workbench:

- Data added via a connection is added as a dynamic dataset.
- View data in a live preview sampled directly from the source data in Azure Databricks.
- Perform wrangling on Azure Databricks datasets.
- Materialize published wrangling recipes in the Data Registry as well as Azure Databricks.

Public preview [documentation](wb-databricks).

**Feature flags:**

- Enable Databricks Driver
- Enable Databricks Wrangling
- Enable Databricks In-Source Materialization in Workbench
- Enable Dynamic Datasets in Workbench

#### AWS S3 connection enhancements {: #aws-s3-connection-enhancements }

A new AWS S3 connector is now available for public preview, providing several performance enhancements as well as support for temporary credentials and parquet file ingest.

Public preview [documentation](dc-s3#aws-s3).

**Feature flag:** Enable S3 Connector

#### Batch monitoring for deployment predictions {: #batch-monitoring-for-deployment-predictions }

Now available for public preview, you can view monitoring statistics organized by batch, instead of by time. With batch-enabled deployments, you can access the **Predictions > Batch Management** tab, where you can create and manage batches. You can then add predictions to those batches and view service health, data drift, accuracy, and custom metric statistics by batch in your deployment. To create batches and assign predictions to a batch, you can use the UI or the API. In addition, each time a batch prediction or scheduled batch prediction job runs, a batch is created automatically, and every prediction from the job is added to that batch.

![](images/rn-batch-monitoring.png)

**Feature flags OFF by default**: Enable Deployment Batch Monitoring, Enable Batch Custom Metrics for Deployments

Public preview [documentation](deploy-batch-monitor).

#### Accuracy for monitoring jobs with aggregation enabled {: #accuracy-for-monitoring-jobs-with-aggregation-enabled }

Now available for public preview, [monitoring jobs](pred-monitoring-jobs/index) for external models [with aggregation enabled](agent-use#enable-large-scale-monitoring) can support accuracy tracking. Enable **Use aggregation** and configure the retention settings, indicating that data is aggregated by the MLOps library and defining how much raw data should be retained for challengers and accuracy analysis; then, to report the **Actuals value column** for accuracy monitoring, define the **Predictions column** and **Association ID column**.

![](images/aggregation-options.png)

**Feature flag OFF by default**: Enable Accuracy Aggregation

For more information, see the [documentation](ui-monitoring-jobs#set-aggregation-options).

#### Schedule notebook jobs {: #schedule-notebook-jobs }

Now available for public preview, you can automate your code-based workflows by scheduling notebooks to run on a schedule in non-interactive mode. Notebook scheduling is managed by notebook jobs that you can create directly from the DataRobot Notebooks interface. Additionally, you can parameterize a notebook job to enhance the automation experience enabled by notebook scheduling. By defining certain values in a notebook as parameters, you can provide inputs for those parameters when a notebook job runs instead of having to continuously modify the notebook itself to change the values for each run.

![](images/nb-sched-5.png)

Public preview [documentation](wb-schedule-nb).

**Feature flag OFF by default**: Enable Notebooks Scheduling

#### Custom environment images for DataRobot Notebooks {: #custom-environment-images-for-datarobot-notebooks }

Now available for public preview, you can integrate DataRobot Notebooks with [DataRobot custom environments](custom-model-environments/index) that define reusable and custom Docker images used to run notebook sessions. You can create a custom environment to use for your notebook sessions if you want full control over the environment, and to leverage reproducible dependencies beyond those available in the built-in images. Compatible custom environments are selectable directly from the notebook interface. DataRobot Notebooks support Python and R custom environments.

![](images/nb-custom-env-2.png)

Public preview [documentation](dr-env-nb#custom-environment-images).

**Feature flag OFF by default**: Enable Notebooks Custom Environments

_All product and company names are trademarks&trade; or registered&reg; trademarks of their respective holders. Use of them does not imply any affiliation with or endorsement by them_.
